EPJ manuscript No. 

(will be inserted by the editor) 



Communication activity in social networks: growth and 
correlations 

Diego Rybski 1,2 , Sergey V. Buldyrev 3 , Shlomo Havlin 4 , Fredrik Liljeros 5 , and Hernan A. Makse 1 

1 Levich Institute and Physics Department, City College of New York, New York, NY 10031, USA 

2 Potsdam Institute for Climate Impact Research (PIK), P.O. Box 60 12 03, 14412 Potsdam, Germany 

3 Department of Physics, Yeshiva University, New York, NY 10033, USA 

4 Department of Physics, Bar-Ilan University, Ramat-Gan 52900, Israel 

5 Department of Sociology, Stockholm University, S-10691 Stockholm, Sweden 

September 21, 2011 [version 12] 

Abstract. We investigate the timing of messages sent in two online communities with respect to growth 
fluctuations and long-term correlations. We find that the timing of sending and receiving messages com- 
prises pronounced long-term persistence. Considering the activity of the community members as growing 
entities, i.e. the cumulative number of messages sent (or received) by the individuals, we identify non-trivial 
scaling in the growth fluctuations which we relate to the long-term correlations. We find a connection be- 
tween the scaling exponents of the growth and the long-term correlations which is supported by numerical 
simulations based on peaks over threshold. In addition, we find that the activity on directed links between 
pairs of members exhibits long-term correlations, indicating that communication activity with the most 
liked partners may be responsible for the long-term persistence in the timing of messages. Finally, we 
show that the number of messages, M, and the number of communication partners, K, of the individual 
members are correlated following a power-law, K ~ M A , with exponent A « 3/4. 
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1 Introduction 

Seeking for simple laws and regularities in human activity, 
researchers belonging to various disciplines aim to study 
social phenomena by describing them with methods from 
natural sciences. Since communication plays a predomi- 
nant role in social systems, it is desired to obtain better 



insight into the nature of communication patterns - and 
therefore to understand both, communication itself and 
the social systems. Although it is clear that communica- 
tion is related to the embedment in social networks, the 
actual dynamical processes are still poorly understood. 

Studying economic data, surprising growth patterns 
have been identified pQ, which seem to be abundant in 
systems with growth- like features [2,3,4,5,6,7,8. Consid- 
ering the units of a system of interest and calculating their 
logarithmic growth rates between two time steps, it was 
found that the standard deviation of the growth rates de- 
cays as a power-law with the initial size [1 . This finding 
represents a violation of Gibrat's law [ MTUlfTT] stating that 
the average and the standard deviation of the growth rate 
of a given economic indicator are constant and indepen- 
dent of the specific indicator value, see also [7J. 

In a recent study [T2] we have found several scaling 
laws characterizing the communication activity in online 
social networks. We found the existence of long-term cor- 
relations in human activity of sending messages to other 
members in the social network. The long-term persistence 
is related to the fluctuations in the growth properties of 
the social network as measured by the cumulative num- 
ber of message sent by the members. The present paper 
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expands this previous work by studying the messages sent 
in two online social networks with respect to the follow- 
ing properties. First, we extend the results obtained in 
[T2"] . revealing the analogue correlations in the timing of 
receiving messages. Furthermore, we analyze the temporal 
correlations of the activity on directed links, i.e. between 
pairs of members, and find almost identical results as on 
the level of the single members. 

Second, in line with |12j we study the growth of the cu- 
mulative communication activity of the members in terms 
of the cumulative numbers of messages sent and received. 
In [T2] we have shown that the standard deviation of the 
growth rates of the cumulative number of messages sent 
by individuals depend on the 'size' of the member (de- 
fined as the cumulative numbers of messages) following a 
power-law with exponent /3 « 0.2, significantly different 
from the random exponent /3 rn d = 1/2, indicating non- 
trivial fluctuations and persistence in the human commu- 
nication activity in the social networks. Here we further 
study the distribution of the logarithmic growth rates and 
find exponential decays similarly to those encountered in 
econophysics [T]. 

Third, in order to understand the relation between the 
long-term correlations and growth fluctuations, we pro- 
pose a simulation approach based on peaks over thresh- 
old modeling. Using artificially generated long-term corre- 
lated sequences, a message is sent when the record exceeds 
a predefined threshold. Numerically, we measure the long- 
term correlations characterized by the exponent H as well 
as growth fluctuations characterized by (3 and find that 
the relation connecting both exponents proposed in [12j 
holds. 

Fourth, we introduce a new growth rate between any 
pair of members quantifying the mutual growth in the 
number of messages. We find that the corresponding growth 
fluctuations follow a power-law with similar exponent as 
for the 'normal' growth rates. We motivate that the expo- 
nent might be related to cross-correlations in the activity 
of the members. 

Fifth, in addition to the temporal correlations, we in- 
vestigate the total number of messages sent or received 
and the total in- and out-degree (i.e. the number of differ- 
ent members from which a member receives or to whom 
he/she sends). We find that the total degree and the final 
number of messages are correlated following a power-law 
with exponent close to 0.75. In the case of final in- vs. out- 
degree, deviations from the linear correlations are found. 

Finally, we point out that there is also a relation be- 
tween our results on growth fluctuations and long-range 
correlations (f3 and H, respectively) and the existence of 
power-law distributed inter-event times characterized by 
the exponent S |13| leading to the clustering and bursts in 
the activity of members. This connection is explored in a 
follow-up paper [14]. 

Our results have important implications for the design 
of communication systems. The correlations can be elab- 
orated to better predict information propagation, see e.g. 
|15j . In addition, the characterization of fluctuations is 
essential for the knowledge of uncertainty. Our approach 



could be also applied in natural systems such as in the 
context of protein unfolding |16) . 

This paper is organized as follows. In Sec. [5] we briefly 
describe the data of messages sent in two online communi- 
ties. Our results are presented in Sec.[3]which is organized 
in four sub-sections - discussing long-term correlations, 
growth fluctuations, modeling, and other correlations. Fi- 
nally, we draw our conclusions in Sec. 21 



2 Data 

We analyze the timing of messages sent in two Internet 
communities [T2"lll7| . The data of the first online commu- 
nity (www.qx.se, QX) consists of over 80, 000 members 
and more than 12.5 million messages sent during 63 days 
(mid November 2005 until mid January 2006). The data 
of the second online community (www.pussokram.com, 
POK) covers 492 days (February 2001 until June 2002) 
of activity with more than 500,000 messages sent among 
almost 30,000 members [WTW2D] . This corresponds to 
the entire lifespan of the social network. Both web-sites 
are used for dating and general social interactions. The 
QX community is used mainly by Swedish gay and les- 
bian while POK was targeted to Swedish teenagers and 
young adults. All data are completely anonymous, lack 
any message content and consist only of the time when 
the messages are sent and identification numbers of the 
senders and receivers. The advantage of these data sets is 
that they provide the exact time when the messages were 
sent - in contrast to similar network data sets consisting 
only of snapshots, i.e. temporally aggregated social net- 
works expressing who sent messages to whom (see [17] for 
a discussion). 

Similarly to other online communities, the members 
can log in and meet virtually. There are different ways of 
interacting in these communities. Common among most 
of such online communities is the possibility to choose 
favorites, i.e. a list of other members, that a person some- 
how feels committed to. In addition, the platforms offer 
the possibility to join groups and discuss with other mem- 
bers about specific topics. We focus on the messages sent 
among the members. These messages are similar to e-mails 
but have the advantage that they are sent within a closed 
community where there are no messages coming from or 
going outside. 

From the message data one can also build networks, 
which consist of links connecting nodes. We consider the 
members as nodes and set a directed link from node a 
to b when member a sends at least one message to b. 
The degree, fc, of a node is the number of other nodes it is 
connected to, i.e. the number of links it has. In the directed 
case one distinguishes between out-degree (number of out- 
going links) and in-degree (number of in-going links). 



The study of the de- identified dating site network data was 
approved by the Regional Ethical Review board in Stockholm, 
record 2005/5:3. 
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At [days] At [days] 

Fig. 1. Comparison of fluctuation functions in (a) daily and 
(b) weekly resolution of members sending messages in POK. 
The different curves correspond to different activity levels: 
M =1-2, 3-7, 8-20, 21-54, 55-148, 149-403, 404-1096, 1097- 
2980 total messages (from bottom to top). The curves in (b) 
have been shifted along the At axis to match daily resolution. 
In both cases the asymptotic scaling is the same. The dotted 
lines correspond to the exponents H = 1 (top) and H — 1/2 
(bottom). 
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Fig. 2. Fluctuation exponents of the communication activ- 
ity (a) sending and (b) receiving messages by members of QX. 
The exponents are plotted as a function of the activity level M, 
i.e. total number of messages, for the original data (green cir- 
cles), and individually shuffled sequences (orange diamonds). 
See also QJ]. 

3 Analysis 

3.1 Long-term correlations 

First, we define the activity record, fJ>j{t), counting the 
number of messages member j sends at day/week t. Thus, 
we study the activity that is aggregated at the daily or 
weekly level. This is done to avoid possible oscillations 
that are observed in the data at both frequencies. 

In a previous study [12] we have applied Detrended 
Fluctuation Analysis (DFA) [2TU22 23 and found that 
the activity records, /J,{t), exhibit long-term correlations, 
which are characterized by a power-law decaying auto- 
correlation function, 

C(At) = -L Mt) (n(t))} W + At) (n(t))}) 
n 

~ {At)- V , 
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Fig. 3. Fluctuation exponents of the communication activ- 
ity (a) sending and (b) receiving messages by members of POK 
(weekly resolution). The exponents are plotted as a function of 
the activity level M for the original data (green circles), and in- 
dividually shuffled sequences (orange diamonds). See also |12] , 

where {nit)) is the average of the record fi(t), is its 
standard deviation, and v is the correlation exponent (1 > 
v > 0). The fluctuation function provided by DFA scales 
as 

F{At) ~ {At) H (1) 

where the exponent H is similar to the Hurst exponent 
(1/2 < H < 1, larger exponents correspond to more pro- 
nounced long-term correlations). It is related to the cor- 
relation exponent via 

v = 2-2H. (2) 

For uncorrelated or short-term correlated records the asymp- 
totic fluctuation exponent is H = 1/2 (for a review we 
refer to [23]). 

In order to study the activity with respect to long- 
term correlations, we apply second order DFA (DFA2) 
[221125] [linear detrending of Hj(t)] and obtain the fluctu- 
ation functions, F^ FA2 {At) (details can be found in [T2]). 
Since the activity records of the individual members are 
too short, we average the squared fluctuation functions 
among members with similar overall activity (i.e. total 
number of messages, M): F{At) = [J2j\M( Fj ( At )T] 1/2 - 
Therefore, we employ logarithmic bins in M. The activity 
distributions are discussed in Sec. 13.4.11 

In Fig. [T] we compare for sending in POK the fluctu- 
ation functions in daily resolution [Fig. Ufa)] and weekly 
resolution [Fig.[TJb)]. In order to match the scales, we have 
shifted the curves in Fig.QJb) along the At-astis. Naturally, 
in daily resolution, the fluctuation functions cover more 
scales. The asymptotic scaling is in both cases the same, 
namely no correlations in the case of least active members 
and strong long-term correlations with fluctuation expo- 
nents close to 1 for the most active members. Moreover, 
for POK in daily resolution, the fluctuation functions ex- 
hibit an increase from small slopes on short time scales to 
larger slopes on large scales. This indicates that the long- 
term correlations do not vanish after certain scale, but 
the opposite, the long-term correlations become stronger. 
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Note, that we use weekly resolution in order to cope with 
possible weekly oscillations 25/ 261127] . 

We measure the fluctuation exponents by applying least 
squares fits to log F(At) vs. log At on the scales 10 < 
At < 63 days (QX) and 10 < At < 70 weeks (POK). For 
the former case the obtained fluctuation exponents are 
plotted in Fig. [5] as a function of the members activity 
level, i.e. their total number of messages M. For sending 
[panel (a)], the less active members exhibit uncorrelated 
behavior. The more messages the members send overall, 
the stronger correlated is their activity. The fluctuation 
exponent £iqx increases with M and reaches values up to 
0.75 ± 0.05 (sending). In contrast, for the shuffled data, 
the fluctuation exponents are always very close to 1/2. 
This confirms that the long-term correlations are due to 
the temporal structure of the times each member sends 
his/her messages, see also [14]. For receiving messages, 
Fig-H^b), we find almost identical results. The error bars 
in Fig. [5] were calculated by subdividing the groups of dif- 
ferent activity level. The size of the error bars is simply 
the standard deviation of the corresponding exponents. 

The estimated fluctuation exponents obtained for POK 
are displayed in Fig. [3] Qualitatively, we obtain a similar 
picture as for QX. However, in contrast to QX, here the 
original records achieve larger fluctuation exponents up to 
0.91 ± 0.04 (sending), disregarding the last points which 
carry large error-bars. A possible reason for these differ- 
ent maximum exponents could be that in the case of POK 
the data covers a much longer period of data acquisition, 
and possible non-stationarities 28 . In QX, the members 
might not have had enough time to exhibit the full extend 
of their persistence, while in POK we follow the entire 
evolution of the online community. 

Indeed, similar behavior of long-term correlations have 
been found in traded values of stocks and e-mail commu- 
nication [29 , 30 j , where the fluctuation exponent increases 
in an analogous way with the mean trading activity of the 
corresponding stock or with the average number of e-mails 
(see also [5T]h 

Apart from these, for human related data, long-term 
persistence has been reported for physiological records (321 
33 , 22] , written language [3D , or for records generated by 
collective behavior such as finance and economy [35,36, 
157] . Ethernet traffic [35], Wikipedia access [35], as well 
as highway traffic [101141] . There are also indications of 
long-term correlations in human brain activity [12E3] and 
human motor activity |44) . 

A question that arises is, why the fluctuation expo- 
nent (in Figs.[2]and[3j) depends on the activity level of the 
members, that is, why the least active members exhibit no 
persistence while the most active members exhibit strong 
persistence. We argue that if only few messages appear 
in the whole period of data acquisition, long-term persis- 
tence cannot be reflected. In these cases it is quite possible 
that much longer records and higher aggregation level such 
as months or years would be needed to reveal the persis- 
tence. But doing so, there would be other members with 
even less messages which then again would probably ap- 
pear with seemingly uncorrelated message signals. Thus, 
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Fig. 4. Temporal correlations in the daily amount of mes- 
sages on directed links in QX. (a) DFA2 fluctuation functions 
versus the time scale At, averaged conditional to the final 
number of messages of each link. The different curves corre- 
spond to different activity levels: M L =1-2, 3-7, 8-20, 21-54, 
55-148, 149-403, 404-1096, 1097-2980 (from bottom to top). 
The dotted lines correspond to the exponents H = 0.75 (top) 
and H = 1/2 (bottom), (b) The DFA2 fluctuation exponent 
Hh,Qx obtained from (a) is plotted as a function of the activity 
level Ml. The exponents were obtained in the range of scales 
10 < At < 63 days. The activity along directed links comprise 
similar long-term correlations as the total activity of individual 
members to all of their acquaintances. 



we propose that the exponents of the largest activity re- 
flect more accurately the scaling behavior of human com- 
munication activity. In Sec. 13.3.11 we propose statistical 
simulations to generate data using peaks over threshold 
(POT) and find that it supports this perception. 

At this point we need to mention that long-term cor- 
relations can be related to broad inter-event time distri- 
butions, i.e. the times between successive messages of in- 
dividual members. Such distributions have been investi- 
gated, see e.g. [T3, 45 . but there is no consensus on the 
functional form. We study the inter-event time distribu- 
tions in a different publication [141 where we demonstrate 
the connection with the long-term correlations found here. 



Along directed links 

In Fig. @] we study for QX the long-term correlations in ac- 
tivity not on the sender or receiver (node) level but on the 
level of messages along directed links. This means that we 
track when a message is sent directed between two mem- 
bers but separately for any pair of members, such as a— >b, 
b— >&, a— ¥d, .... Accordingly, we determine the activity 
records /%t)(i)j Mba(^)j Mad(0i e ^ c -, expressing how many 
messages have been sent each day/week, t, between any 
pair of members. Analogous, there is also an activity level 
for the links, , . . . (we disregard those pairings with- 
out activity). Then we perform the analogous analysis for 
long-term correlations by applying DFA2 and averaging 
among pairings with similar overall activity (the distribu- 
tions of activity are discussed in Sec. 13.4.1]) . The fluctua- 
tion functions in Fig. |U[a) have asymptotic slopes close to 
1/2 for those links with few total number of messages. In 
contrary, those links with many total number of messages 
exhibit long-term correlations with exponents up to 0.74. 
The fluctuation exponents as a function of the activity 
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level Ml are plotted in Fig. HJb). Apart from the fact, 
that by definition the number of messages on the most ac- 
tive links is lower (or equal) than the number of messages 
of the most active members, the curve looks very similar 
to the one in Fig. HJa), in particular the maximum expo- 
nents are quite similar (Hqx ~ 0.75 and #l,qx ~ 0.74). 
This indicates, that the persistence in the communication 
may be dominated by the communication activity with 
the most liked partners. 

In [15] a different concept of persistence links has been 
investigated. The period of data acquisition is partitioned 
into time slices in each of which a network is built. Then 
the persistence is defined as the normalized number of 
time slices in which a certain link appears. However, the 
approach of our work is not compatible with the one in 
[46] and we cannot directly compare the results. 



3.2 Growth process 

3.2.1 Growth in the number of messages 

As suggested in [T2] we also analyze the growth proper- 
ties of the message activity. This concept is borrowed from 
econophysics, where the growth of companies has been 
found to exhibit non-trivial scaling laws [T], that in par- 
ticular violate the original Gibrat's law [9l fT0lfTTT[47] and at 
the same time represents a generalized Gibrat's law (GGL) 
[12] . In the present study, each member is considered as 
a unit and the number of messages sent or received since 
the beginning of data acquisition represents its size. We 
analyze the growth in the number of messages in analogy 
to other systems such as the growth of companies [T1I48] 
or the growth of cities 7, 49j . The analogy is supported by 
some aspects: (i) The members of a community represent 
a population similar to the population of a country, (ii) 
The number of members fluctuates and typically grows 
analogous to the number of cities of a country, (iii) The 
activity or number of links of individuals fluctuates and 
grows similar to the size of cities. 

The cumulative number, mP(t), expresses how many 
messages have been sent by a certain member j up to a 
given time t [for a better readability we will not write 
the index j explicitly, m(t)\. We consider the evolution of 
m(t) between times to and t\ within the period of data 
acquisition T (to < t\ < T) as a growth process, where 
each member exhibits a specific growth rate rj (r for short 
notation): 

r = ln^, (3) 

where too = m(to) and mi = m(ti) are the number of 
messages sent until to and ti, respectively, by every mem- 
ber. To characterize the dynamics of the activity, we con- 
sider two measures, (i) The conditional average growth 
rate, (t(too)) , quantifies the average growth of the num- 
ber of messages sent by the members between to and ti 
depending on the initial number of messages, mo. In other 
words, we consider the average growth rate of only those 
members that have sent Too messages until to- (h) The 



conditional standard deviation of the growth rate for those 
members that have sent mo messages until to, 



o-(m ) = \J ((r(m ) - (r(m ))) 5 



(4) 



expresses the statistical spread or fluctuation of growth 
among the members depending on mo . Both quantities are 
relevant in the context of Gibrat's law in economics [51 1101 
ITT1I47] which proposes a proportionate growth process en- 
tailing the assumption that the average and the standard 
deviation of the growth rate of a given economic indica- 
tor are constant and independent of the specific indicator 
value. That is, both (r(mo)) and er(mo) are independent 
of mo- 

As shown in JT5] , for the message data the conditional 
average growth rate is almost constant and only decreases 
slightly, 

(r(mo)> ~ m - Q , (5) 

with an exponent a 0.05. This means that members 
with many messages in average increase their number of 
messages almost with the same rate as members with few 
messages. In contrast, the conditional standard deviation 
clearly decreases with increasing mo, 



cr(m ) 



(0) 



where to = T/2 is optimal in terms statistics. In this case 
the exponents /3 QX = 0.22 ± 0.01 and /3 PO k = 0.17 ± 
0.03 for sending messages were found [12.. This means, 
although the average growth rate almost does not depend 
on mo, the conditional standard deviation of the growth 
of members with many messages is smaller than the one of 
members with few messages. Due to weaker fluctuations, 
active members are relatively better predictable in their 
activity of sending messages. 

It has been shown that the fluctuation exponent H and 
the growth fluctuation exponent (3 are related via [TJ] 



13=1- H. 



(7) 



Equation ([7]) is a scaling law formalizing the relation be- 
tween growth and long-term correlations in the activity. 
According to Eq. (O, the original Gibrat's law (/3q = 0) 
corresponds to very strong long-term correlations with 
Hq = 1. In contrast, /3 rn d = 1/2 represents completely 
random activity (H ln d = 1/2). The observed message data 
comprises 1/2 > (3 > and 1/2 < H < 1. Surprisingly, 
the values of j3 found here are very close to the (3 values 
found for companies in the US economy pQ. 

In the case of companies, also the distribution of growth 
rates has been studied. It was found that the distribution 
density follows pQ: 



p(r\m ) 



1 



scr(mo 



■ exp 



s\r 



-(r(m ))\ \ 
a (mo) J 



(8) 



whereas s — y/2. Next we analyze, how the growth rates r 
are distributed in the case of the message data. 

First we need to point out that in contrast to the 
growth of companies, our entities can never shrink. The 
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Fig. 5. Scaled probability density of growth rates r, Eq. @, 
in the number of messages by members of QX. (a) Sending 
and (b) receiving. The times for mo and mi have been chosen 
as to = T/2 and £i = T. The symbols correspond to different 
initial number of messages mo. The axis are scaled assuming 
a distribution according to Eq. Q with s — 1 which then 
corresponds to the dotted lines. 
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Fig. 7. Average mutual growth rate and standard deviation 
versus foregoing difference in the number of messages for send- 
ing in QX. The average (open squares) and standard deviation 
(filled circles) of the mutual growth rate r x , Eq. are plotted 



conditional to the initial difference m - 



whereas to = T/2 



and t\ — T. (a) Original data and (b) shuffled data. The dotted 
line in (a) corresponds to the exponent j3 x = 0.3 and in (b) to 
Px = 1/2. 
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Fig. 6. Scaled probability density of growth rates r, Eq. Q, 
in the number of messages by members of POK. (a) Sending 
and (b) receiving. Analogous to Fig. [5] 



distribution of growth rates p(r|mo) seems to be indepen- 
dent from the long-term correlations which are reflected 
in cr(mo) with the exponent /3. 



3.2.2 Mutual growth in the number of messages 

Next we study a variation of growth. Instead of consider- 
ing the absolute number of messages a member sends, we 
study the difference in the number of messages compared 
to any other member, the mutual difference m l (t) — m? it) . 
Thus, the growth rate is defined analogous to Eq. ([3]) 



= In- 



(9) 



members cannot loose messages, the number m(t) either 
increases or remains the same. Accordingly, in our case 
r > and therefore s — 1, as can be derived for the 
single-sided exponentially decaying distribution. 

Figure [5] shows p(r\mo) for QX where the values are 
scaled to collapse according to Eq. with s = 1 . In order 
to have reasonable statistics, we define the condition mo in 
rather wide ranges, namely according to the decimal log- 
arithm. For sending [Fig. E^a)] and receiving [Fig. EJb)] 
messages the scaled probability densities collapse and are 
quite similar. Nevertheless, the growth rates do not ex- 
actly follow Eq. JSJ with s = 1. While for the less active 
members with small growth rates we find a good agree- 
ment, for more active members and large growth rates the 
obtained curves deviate from the theoretical one towards 
a steeper decay. 

The corresponding results for POK are shown in Fig. [HI 
Again, sending and receiving are very similar. The curves 
collapse reasonably, but in contrast to QX here the mea- 
sured p(r\mo) overall deviate from the theoretical one 
comprising less steep slopes. 

We argue that as for single time series, distribution 
and correlation properties are in most cases independent, 
the same holds for the message data and the growth. The 



where now there is a growth rate for every pair of mem- 
bers i and j. The conditional average growth rate and the 
corresponding standard deviation is then taken over all 
possible pairs and the condition is the difference at to, 
m,Q — to = m l (to) — m J (to), providing the quantities 
(r x (m — Wq)) and cr(m — m J ). We disregard combina- 



o _ 



= or 



< 0. 



tions of i and j where m — m 

,.„ ,.„ 

The results for sending in QX are shown in Fig. [7] 
Apart from a small decrease up to m — m J « 50, the av- 
erage growth rate is constant [Fig. Efa)]. The conditional 
standard deviation asymptotically follows a slope /3 X ~ 
0.3 with deviations to small exponents for small m — m J . 
In the case of the shuffled data [Fig. [7f b)] , as expected, the 
average growth rate is constant while the standard devia- 
tion decreases steeper than for the original data, namely 
with /3 x .md — 1/2, although not with a nice straight line. 
Nevertheless, we conclude that the scaling of the standard 
deviation in Fig. (7{a) must be due to temporal correla- 
tions between the members. The growth of the difference 
between their number of messages comprises similar scal- 
ing as the individual growth. 

We conjecture that cr(m — m J ) reflects long-term cross- 
correlations in analogy to a (mo) for auto-correlations. How- 
ever, so far, we are not able to provide further evidence for 
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this analogy and the corresponding relation to = 1 — H, 
Eq. (0 , since an appropriate technique for the direct quan- 
tification of long-term cross-correlations is lacking. 

3.3 Modeling 

In what follows, we propose numerical simulations with 
the purpose of testing the methods and empirical pat- 
terns we found. We study three approaches adopted to 
the modeling of human activity: (a) peaks over thresh- 
olds, (b) preferential attachment [50], and (c) cascading 
Poisson process [27] . 

3.3.1 Peaks over threshold (POT) simulations 

Our finding that the activity of sending messages exhibits 
long-term persistence asserts the existence of an underly- 
ing long-term correlated process. This can be understood 
as an unknown individual state driven by various inter- 
nal and external stimuli [51 , 52 , 53 , 54 , 27,33] increasing the 
probability to send messages. Generating such a hypothet- 
ical long-term correlated internal process (xi), simulated 
message data can be defined by the instants at which this 
internal process exceeds a threshold q (peaks over thresh- 
old, POT), see [55.561157] and references therein. 

More precisely, we consider a long-term correlated se- 
quence (xi) consisting of N* random numbers that is nor- 
malized to zero average ((x) — 0) and unit standard devi- 
ation (a x = 1). Choosing a threshold q, at each instant i 
the probability to send a messages is: 

/ 1 for Xi > q , 
P^ = {0 ior Xi <q ■ ( 10 ) 

Thus, the message events are given by the indices i of 
those random numbers Xi exceeding q. 

Figure[5Ja) illustrates the procedure. The random num- 
bers are plotted as brown circles and the events exceeding 
the threshold (orange dashed line) by the green diamonds. 
The resulting instants are depicted in Fig. IHJb) represent- 
ing the simulated messages. The threshold approximately 
predefines the total number of events and accordingly the 
average inter-event time. Using normal-distributed num- 
bers (xi), the number of events/messages is approximately 
given by the length N* and the inverse cumulative dis- 
tribution function associated with the standard normal 
distribution (probit- function). Additionally, the random 
numbers we use are long-term correlated with variable 
fluctuation exponent. We impose these auto-correlations 
using Fourier Filtering Method 23 , 5F] . Next we show that 
this process reproduces the scaling in the growth, i.e. GGL, 
as well as the variable long-term correlations in the activ- 
ity of the members (e.g. Figs. [5] and |3]). 

For testing this process we create 100,000 independent 
long-term correlated records (xi) of length N* = 131, 072, 
impose the fluctuation exponent -ffi mp , and choose for each 
one a random threshold q between 1 and 6, each represent- 
ing a sender. Extracting the peaks over threshold, we ob- 
tain the events and determine for each record/member the 



3 




5 10 15 20 

time [w] 



Fig. 8. Illustration of the peaks over threshold simulations, 
(a) An underlying and unknown long-term correlated process 
determines the instantaneous probability of sending messages. 
Once this state passes certain threshold q (dashed orange line) 
a messages is sent (green diamonds), (b) Generated instants 
of messages, (c) with windows for aggregation, such as mes- 
sages per day. (d) Aggregated record of messages in windows 
of size w, here w = 10. 

growth in the number of events/messages between N* /2 
and N*. This is, for each record/member we count the 
numbers of events/messages mo until to = t i= ^, m as well 
as mi until t\ — U = n* and calculate the growth rate 
according to Eq. ([3]). We then calculate the conditional 
average (r(mo)) and the conditional standard deviation 
a (mo) where the values of mo are binned logarithmically. 
The quantities are plotted in Fig. [pja) and (b), while in 
panel (b) we include slopes expected from (3 — 1 — H, 
Eq. (JTJ) . We find that the numerical results reasonably 
agree with the prediction (solid lines). Except for small 
mo, these results are consistent with those found in the 
original message data. 

The fluctuation functions can be studied in the same 
way. As described in Sec. 13.11 we find long-term correla- 
tions in the sequences of messages per day or per week. 
On the basis of the above explained simulated messages, 
we analyze them in an analogous way. For each thresh- 
old q = 1.0, 1.5, ... 4.0, 4.5 we create 100 long-term cor- 
related records of length N* — 4, 194, 304 with imposed 
fluctuation exponent H lmp = 0.9, extract the simulated 
message events, and aggregate them in non-overlapping 
windows of size w = 100. This is, tiling N* in segments 
of size w and counting the number of events occurring in 
each segment [Fig.[Sfc) and (d)]. The obtained aggregated 
records represent the analogous of messages per day or per 
week and are analyzed with DFA averaging the fluctua- 
tion functions among those configurations with the same 
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Fig. 9. Results of numerical simulations, (a) Mean growth 
rate conditional to the number of events until to = N* /2 
as obtained from 100, 000 long-term correlated records of 
length iV* = 131,072 with variable imposed fluctuation ex- 
ponent //i mp between 1/2 and 0.9 and random threshold q 
between 1.0 and 6.0. (b) As before but standard deviation 
conditional to the number of events. The solid lines represent 
power-laws with exponents /3 expected from the imposed long- 
term correlations according to Eq. (0. (c) Long-term corre- 
lations in the sequences of aggregated peaks over threshold. 
For every threshold q between 1.0 (violet) and 4.5 (black) 
100 normalized records of length N* = 4, 194, 304 have been 
created with _Hi mp = 0.9. The events are aggregated in win- 
dows of size w = 100. The panel shows the averaged DFA2 
fluctuation functions, (d) Fluctuation exponents on the scales 
1,000 < s < 10,000, as a function of the total number of 
events. 

threshold and thus similar number of total events. The 
corresponding results are shown in Fig. [SJc) and (d). We 
obtain very similar results as in the original data. We find 
vanishing correlations for the sequences with few events 
(large q) and pronounced long-term correlations for the 
cases of many events (small q), while the maximum fluc- 
tuation exponent corresponds to the chosen H lmp . This 
can be understood by the fact that for q close to zero 
the sequence of number of events per window converges 
to the aggregated sequence of or 1 (for x < or x > 0) 
reflecting the same long-term correlation properties as the 
original record [59] . For a large threshold q too few events 
occur to measure the correct long-term correlations, e.g. 
the true scaling only turns out on larger unaccessible time 
scales requiring larger w and longer records. 

Although the simulations do not reveal the origin of 
the long-term correlated patchy behavior, they support 
Eq. (J7|) and the concept of an underlying long-term cor- 
related process. Consistently, an uncorrelated, completely 
random, underlying process recovers Poisson statistics and 
therefore /3 m d = 1/2 for the growth fluctuations as well 
as uncorrelated message activity (iZ rn d = 1/2). For 1/2 < 
H < 1 it has been shown 55, 57] that the inter-event times 
follow a stretched exponential, see also [13,60,14 . 
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Fig. 10. Average degree growth rate and standard deviation 
versus foregoing degree for the preferential attachment network 
model [SO]. The average (green open diamonds) and standard 
deviation (blue filled circles) of the growth rate tba are plotted 
conditional to fco, the degree of the corresponding nodes at the 
first stage. We choose average degree (k) = 20, 50, 000 nodes 
in to, and 100,000 nodes in t\. The error-bars are taken from 
10 configurations. The dashed line in the bottom corresponds 

tO /?BA = 1/2. 

3.3.2 Preferential attachment 

Next we compare our findings with the growth properties 
of a network model. We investigate the Barabasi-Albert 
(BA) model which is based on preferential attachment and 
has been introduced to generate a kind of scale-free net- 
works [50,61 with power-law degree distribution p(k) |62l 
|6"3"] , whereas the degree k of a node is the number of links 
it has to other ones. Essentially, it consists of subsequently 
adding nodes to the network by linking them to existing 
nodes which are chosen randomly with a probability pro- 
portional to their degree. 

We obtain the undirected network and study the de- 
gree growth properties by calculating the conditional av- 
erage growth rate (rsA(^o)) and the conditional standard 
deviation CTBA(fco) obtained from the scale- free BA model. 
The times to and t\ are defined by the number of nodes 
attached to the network. 

Figure [TU] shows the results where an average degree 
(k) = 20; 50, 000 nodes in to, and 100, 000 nodes in t\ were 
chosen. We find constant average growth rate that does 
not depend on the initial degree ko- The conditional stan- 
dard deviation is a function of fco and exhibits a power-law 
decay with /?ba = 1/2 as expected for such an uncorre- 
lated growth process [T2] . Therefore, a purely preferential 
attachment type of growth is not sufficient to describe the 
type of social network dynamics found in Sec. 13.2. 1[ since 
additional temporal correlations are involved in the dy- 
namics of establishing acquaintances in the community. 

The value /?ba = 1/2 in Eq. iJT]) corresponds to H = 
1/2 indicating complete randomness. There is no memory 
in the system. Since each addition of a new node is com- 
pletely independent from precedent ones, there cannot be 
temporal correlations in the activity of adding links. In 
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contrast, for the out-degree in QX and POK we obtained 
(3 KQX = 0.22 ± 0.02 and /3 k ,poK = 0.17 ± 0.08 Q2], which 
is supported by the (non-linear) correlations between the 
number of messages and the out-degree as presented in 
Sec. EH 

Interestingly, an extension of the standard BA model 
has been proposed [64], see also [65ll66| . that takes into 
account different fitnesses of the nodes to acquiring links. 
We think that such fitness could be related to growth fluc- 
tuations, thus providing a route to modify the BA model 
to include the long-term correlated dynamics found here. 



3.3.3 Cascading Poisson process 

In this Section we elaborate the model proposed in [2"T] 
and examine it with respect to long-term correlations. The 
model is based on a cascading Poisson process (CPP), 
according to which the probability that a member enters 
an active interval is p(t) = N w pd(t)p w (t), where N w is the 
average number of active intervals per week, Pd(t) is the 
probability of starting an active interval at a particular 
time of the day, and p w (t) is the probability of starting 
an active interval at a particular day of the week. Once a 
member enters such an active interval he/she sends a set of 
iV a + 1 messages, where N a is drawn from the distribution 
p(N a ). The messages sent in such an active interval are 
sent randomly, i.e. a homogeneous Poisson process with 
rate p a events per hour. 

First, we study the example of User 2881 as analyzed 
in [27] (please note that in [27] a different data set is stud- 
ied and the user is neither in OC1 nor in OC2). We ex- 
tract from [21] the parameters N w = 7.3 active intervals 
per week, p a = 1.7 events per hour, as well as (visually) 
the distributions Pd(t), Pw{t), and p{N a ). The original pe- 
riod of 83 days is not sufficient to apply DFA and we 
run the model for this set of parameters over 800k days. 
Then we extract the record of number of messages per day, 
p(t), and apply DFA. The obtained fluctuation functions 
are shown in Fig. [TTT aV On small scales below 100 days 
a hump in the F(At) can be identified, which is due to 
oscillations in /i(i) 23 . While asymptotically the influ- 
ence of oscillations vanishes, on scales below the wave- 
length, the oscillations appear as correlations [increased 
slope in F(At)] and on scales above the wavelength, the 
oscillations appear as anti-correlations [decreased slope in 
F(At)]. Asymptotically, we find F(At) ~ (At) 1 / 2 , i.e. 
Hqpp ~ 1/2, corresponding to a lack of long-term cor- 
relations. Even on scales up to 83 days, we rather find 
Hqpp < 1/2 (which we expect from the imposed weekly 
oscillations). 

Next, we study 20,000 simulated e-mail senders with 
randomly chosen parameters, (i) We fill p w (t) with ran- 
dom numbers and set p w (t) = for t — 0,6, i.e. Sunday 
and Saturday, (ii) We fill Pd(t) with random numbers and 
set pd(t) = for t = 0...5 and t = 23, i.e. at night, 
(iii) We set p(N a ) starting with a random p(N a = 0). 
Then p(N a ) decays exponentially up to a random N a be- 
low 36. p w (t), p w {t) = 0, and p(N a ) are normalized, (iv) 
We randomly choose < N w < 40. (v) We randomly 
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Fig. 12. Correlations between the final degree K and the final 
number of messages M for QX. (a) Out-degree and sending 
messages; (b) in-degree and receiving messages. The dashed 
lines correspond to a power-law with exponent 0.75. Members 
sending many messages also tend to have high out-degree, but 
not linearly, rather following a power-law. 



choose < p a < 30. From |27] Supporting Information 
2 (SI2) we estimated the typical maximum values of N a , 
N w , and p a (36, 40, and 30, respectively). We run the 
model for 83 days and extract the p(t) for each simulated 
e-mail sender. Then we apply DFA2 and average the fluc- 
tuation functions according to the final number of mes- 
sages, M, see also [Hj. The fluctuation functions for the 
various activity levels are depicted in Fig. ITTT b). We find 
that members with small final number of messages ex- 
hibit uncorrelated behavior. The more active the mem- 
bers the more pronounced become the oscillations which 
we already discussed in the context of Fig. UTTa'l. Thus, 
asymptotic -Hcpp < 1/2 for large M is due to the weekly 
cycles [53]. In our data, oscillations do not dominate the 
DFA fluctuation functions. Moreover, for OC2 we also find 
long-term correlations in weekly resolution (Fig. [T]). 

Based on periodic probabilities and Poisson statistics, 
the CPP model represents a powerful concept to charac- 
terize inter-event times. For this purpose, the average N w 
seems to be sufficient. However, in order to recover long- 
term correlations, time dependent N w = N w (t) seem to be 
necessary. In fact, the number of active intervals per week, 
A w , fluctuates, as can be seen in [27]SI2 (upper most row 
of the panels). Thus, we suggest to extend the model by 
introducing a memory kernel, see e.g. |54) . or by using 
long-term correlated N w (t). 



3.4 Other correlations 

In this Section we want to discuss other types of correla- 
tions. Figure [12] shows for QX the final degree K — k(T) 
versus the final number of messages M = m(T). We find 
that for both, sending and receiving, the two quantities 
are correlated according to: 



K ~ M x with A R3 3/4 



(11) 



Similar relations have also been found for other data [67: ■ 
Since the correlations are positive, those members that 
send many messages, in average, also have more acquain- 
tances to whom they send, but they know less acquain- 
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Fig. 11. DFA fluctuation functions of message data created with the model proposed in 27 . (a) User 2881. We visually extract 
the model parameters from Fig. 3 in [27] and generate message data for approx. 800k days. The panel shows the fluctuation 
functions from DFA1 and DFA2, which asymptotically go as ~ (At) 1 / 2 , i.e. no long-term correlations. The hump on small scales 
is due to the model inherent oscillations [23] . The dashed vertical line is placed at At = 83 days, (b) Random parameterization. 
We randomly choose the model parameters, create 20k records of 83 days and average the obtained DFA2 fluctuation functions 
according to the final number of messages, M. While for those simulated members with few messages we find F(At) ~ (At) 1 / 2 , 
the F(At) of the most active members exhibit a hump due to oscillations similar to the one in panel (a). 
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Fig. 13. Correlations between activity and passivity for QX. 
(a) Final number of messages M received and sent; (b) final 
in- and out-degree. The dashed lines correspond to a linear 
relation. Those members who send many messages also receive 
many. But, those members who know many people to whom 
they send messages do not necessarily know as many people 
from whom they receive messages. 



tances than they would in the case of linear correlations. 
For receiving, Fig. 112( b). this correlation is very similar. 

The number of messages sent versus the number of 
messages received (for QX) is displayed in Fig. [Flta). 
Asymptotically the activity and passivity are linearly re- 
lated and on average for every message sent there is a 
received one or vice versa. This, of course, does not mean 
that every message is replied. However, the less active 
members in average tend to receive more messages than 
they send. For example, those members who send in av- 
erage one message receive about three. Nevertheless, the 
more active the members are the more the sending and 
receiving behavior approaches the linear relation. In con- 
trast, for the degree, Fig.Q^b), the asymptotic linear rela- 
tion does not hold. Those members with large out-degree 
and small in-degree are referred to as spammers, since they 
send to many different people but only receive from few. 

For POK we find similar results in Figs. Q3] and [Pol 
The final degree and the final number of messages also 
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Fig. 14. Correlations between the final degree K and the final 
number of messages M for POK. (a) Out-degree and sending 
messages; (b) in-degree and receiving messages. Analogous to 

Fig.rn] 

messages M 




Fig. 15. Correlations between activity and passivity for POK. 
(a) Final number of messages M received and sent; (b) final 
in- and out-degree. Analogous to Fig. [13] 



scale with an exponent close to 0.75, although for sending 
messages there exist some deviations of the most active 
members [Fig.fTWa)]. Also, the correlations of sending and 
receiving are linear, the same holds for in- and out-degree. 
However, the most active members again deviate with low 
receiving part, i.e. both low number of received messages 
as well as low in-degree, Fig. [po] Nevertheless, the results 
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Fig. 16. Probability densities of activities and degrees. The 
probabilities are plotted versus the total number of messages, 
M, and the final degree, K, for (a) sending in QX and (b) 
sending in POK. The panel (c) and (d) exhibit the probability 
densities of the total number of messages along directed links, 
Ml, for QX and POK, respectively. The dotted lines serve as 
guides to the eye and have the indicated slopes. 

for both data sets are mainly consistent and the power-law 
relation Eq. (fTTj) is a remarkable regularity. 

3.4.1 Activity and degree distributions 

Finally, we want to briefly discuss the distributions of ac- 
tivities and degrees. If we assume p{M) ~ M~ lM and 
p{K) ~ K~~< K , then with Eq. (JTTJ) the exponents should 
be related according to 

Ik = 1 + (7m - 1)/A . (12) 

FigurelWa+b) displays the probability densities, p{M) 
and p{K), for both online communities. Although the dis- 
tributions are rather broad they do not exhibit straight 
lines in double logarithmic representation. In panel (b) 
we include some guides to the eye with slopes according 
to Eq. (fl~2j) and which roughly follow the obtained curves. 
However, A s» 0.75 is relatively close to 1 so that the dif- 
ferences are minor. 

The probability densities of activity along direct links 
are displayed in Fig. Qj^c-l-d) for QX and POK, respec- 
tively. In both cases the frequency of large activity de- 
cays approximately following a power-law with exponent 
around 3.5. 



4 Conclusions 

Our work reviews and further supports previous empir- 
ical findings [T^] extending them by some features. The 
obtained exponents are summarized in Table. [T] 



In addition to |12| , we find very similar characteristics 
for the passivity of receiving as for the activity of send- 
ing messages. This is in line with the strong correlations 
between individual sending and receiving, i.e. most of the 
messages are somehow replied sooner or later. Further- 
more already the communication between two individuals 
comprises long-term persistence. 

Investigating the probability densities of logarithmic 
growth rates (i.e. growth of the cumulative number of mes- 
sages between two time steps of any member) , we are able 
to collapse the curves by scaling them with conditional av- 
erage growth rates and conditional standard deviations. 
While less active members follow well the exponentially 
decaying probability density, for the more active members 
deviations are found in the case of large growth rates. 

Moreover, we introduce a new growth rate, namely the 
mutual growth in the number of messages. This is the dif- 
ference in the number of messages sent between pairs of 
members at two time steps. The conditional standard de- 
viation of this mutual growth rate also decays as a power- 
law with increasing initial difference, whereas the expo- 
nent is close to 0.3 and changes to 1/2 when the data 
is shuffled. We conjecture that this growth reflects cross- 
correlations in the activity. 

Finally, we propose simulations to reproduce the long- 
term correlations and growth properties. Basically it con- 
sists of generating long-term correlated sequences and defin- 
ing a threshold. All values of such sequences above the 
threshold (POT) represent a message event. We show that 
then the correlation and growth features, being deter- 
mined by the imposed fluctuation exponent, confirm the 
relation ft = 1 — H [12]. Including further features, this 
approach could be a starting point for more elaborated 
modeling of human dynamics. 

We would like to note that - except Sec. 13.2.21 about 
mutual growth in the number of messages (and Sec. l3.4[) - 
all analysis and results refer to auto-correlations. As phe- 
nomena, auto- and cross-correlations can occur indepen- 
dently. However, since most of the messages are replied, it 
is very likely that there are also cross-correlations between 
the members activity, which to our knowledge has not yet 
been studied systematically. 

Thus, our work opens perspectives for further research 
activities. In particular, the origin of the long-term persis- 
tence in the communication remains an important ques- 
tion. In I 1 we demonstrate the relation of ft, H with 
inter-event time scaling. From a psychological/sociological 
point of view one may argue where the persistence is orig- 
inated. Is it purely due to a state of mind, solipsistic, 
emerging from moods, or is it due to social effects, i.e. 
that the dynamics in the social network induces persis- 
tent fluctuations? One hypothesis could be that already 
the social network is correlated [55] ■ 
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Table 1. Overview of the obtained exponents. QX and POK are the two data-sets. For the BA model and the CP process see 
|50] and [27], respectively. Hi, is the fluctuation exponent along directed links, fit is the growth fluctuation exponent when the 
degree is considered, and /3 X is the mutual growth fluctuation exponent based on the growth between pairs. 
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